Distributed Framework for Data Mining As a Service on Private Cloud
نویسندگان
چکیده
Data mining research faces two great challenges: i. Automated mining ii. Mining of distributed data. Conventional mining techniques are centralized and the data needs to be accumulated at central location. Mining tool needs to be installed on the computer before performing data mining. Thus, extra time is incurred in collecting the data. Mining is 4 done by specialized analysts who have access to mining tools. This technique is not optimal when the data is distributed over the network. To perform data mining in distributed scenario, we need to design a different framework to improve efficiency. Also, the size of accumulated data grows exponentially with time and is difficult to mine using a single computer. Personal computers have limitations in terms of computation capability and storage capacity. Cloud computing can be exploited for compute-intensive and data intensive applications. Data mining algorithms are both compute and data intensive, therefore cloud based tools can provide an infrastructure for distributed data mining. This paper is intended to use cloud computing to support distributed data mining. We propose a cloud based data mining model which provides the facility of mass data storage along with distributed data mining facility. This paper provide a solution for distributed data mining on Hadoop framework using an interface to run the algorithm on specified number of nodes without any user level configuration. Hadoop is configured over private servers and clients can process their data through common framework from anywhere in private network. Data to be mined can either be chosen from cloud data server or can be uploaded from private computers on the network. It is observed that the framework is helpful in processing large size data in less time as compared to single system.
منابع مشابه
Distributed Data Mining Framework for Cloud Service
The article describes the data mining framework for building a cloud service. The framework can use different distributed execution environments. The article describes the mapping of a data mining algorithm (decomposed into functional blocks) on the set of distributed handlers. These handlers may be implemented as threads, actors and others. In addition, it describes the approach for creation o...
متن کاملA Framework for Evaluating Cloud Computing User’s Satisfaction in Information Technology Management
Cloud computing is a new discussion in enterprise IT. It has already become popular in terms of distributed technology in some companies. It enables managers to setup and run the intended businesses by avoiding excessive spending on computers, software and hiring expert staff, which proves to be cost effective. Cloud computing also helps users pay for the IT services without spending massive am...
متن کاملAn Effective Task Scheduling Framework for Cloud Computing using NSGA-II
Cloud computing is a model for convenient on-demand user’s access to changeable and configurable computing resources such as networks, servers, storage, applications, and services with minimal management of resources and service provider interaction. Task scheduling is regarded as a fundamental issue in cloud computing which aims at distributing the load on the different resources of a distribu...
متن کاملA hybrid cloud-based distributed data management infrastructure for bridge monitoring
This paper describes a hybrid cloud-based distributed data management infrastructure platform for bridge monitoring applications. As the deployment of sensors and the collection of monitoring data continue to grow, proper management of the data becomes a paramount issue. Cloud computing is one viable approach that is popular among IoT and big data vendors. Cloud service provides many useful fea...
متن کاملStream Management within the CloudMiner
Nowadays cloud computing has become a major trend that enterprises and research organizations are pursuing with increasing zest. A potentially important application area for clouds is data analytics. In our previous publication, we introduced a novel cloud infrastructure, the CloudMiner, which facilitates data mining on massive scientific data. By providing a cloud platform which hosts data min...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014